Automatically incorporating unknown words in JUPITER

نویسنده

  • Grace Chung
چکیده

This paper concerns the handling of out-of-vocabulary (OOV) words in the JUPITER weather information system. Specifically our objective is to deal with weather queries regarding unknown cities. We have implemented a system which can detect the presence of an unknown city name, and immediately propose a plausible spelling for that city. Potentially, the city can be dynamically incorporated into the recognizer lexicon. The three-stage system described in [1] was implemented in the JUPITER domain, and this paper will detail the development of a system that uses an ANGIE-based framework to model both spelling and pronunciation simultaneously, and uses automatically derived novel lexical units in the first stage. We report results on an independent test set containing unknown cities. Compared with a single-stage baseline, word error was reduced by 29.3% (from 24.6% to 17.4%) and understanding error was reduced by 67.5% (from 67.0% to 21.8%) on the three-stage configuration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Incorporating Unknown Words in Jupiter1

This paper concerns the handling of out-of-vocabulary (OOV) words in the JUPITER weather information system. Specifically our objective is to deal with weather queries regarding unknown cities. We have implemented a system which can detect the presence of an unknown city name, and immediately propose a plausible spelling for that city. Potentially, the city can be dynamically incorporated into ...

متن کامل

A three-stage solution for flexible vocabulary speech understanding

This paper discusses our three-stage approach to a flexible vocabulary speech understanding system, which can detect out-ofvocabulary (OOV) words, and hypothesize their phonetic and orthographic transcriptions. In the first stage, we introduce the column-bigram finite-state transducer (FST) which, while embedding ANGIE sublexical models, also supports previously unseen data from unknown words. ...

متن کامل

A Three-stage Solution for Flexible Vocabulary Speech Understanding1

This paper discusses our three-stage approach to a flexible vocabulary speech understanding system, which can detect out-ofvocabulary (OOV) words, and hypothesize their phonetic and orthographic transcriptions. In the first stage, we introduce the column-bigram finite-state transducer (FST) which, while embedding ANGIE sublexical models, also supports previously unseen data from unknown words. ...

متن کامل

Translating Chinese Unknown Words by Automatically Acquired Templates

In this paper, we present a translation template model to translate Chinese unknown words. The model exploits translation templates, which are extracted automatically from a word-aligned parallel corpus, to translate unknown words. The translation templates are designed in accordance with the structure of unknown words. When an unknown word is detected during translation, the model applies tran...

متن کامل

Automatically Generated Models for Unknown Words

Especially in recognition of spontaneous speech it is necessary to cope with the occurrence of unknown words. We present an approach to unknown word detection which is integrated into a standard HMM speech recognizer. From the context dependent sub-word units, e.g. triphones, that can be found in the training database a generic word model can be derived automatically using the context restricti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000